02. The Big Picture

The Big Picture

Before digging into the details of policy gradient methods, we'll discuss how they work at a high level.

M3L3 C02 V6

## Quiz

Which of the following is true about how the policy gradient method will work? (Select all that apply.)

SOLUTION:
  • For each episode, if the agent won the game, we'll amend the policy network weights to make each (state, action) pair that appeared in the episode to make them more likely to repeat in future episodes.
  • For each episode, if the agent lost the game, we'll change the policy network weights to make it less likely to repeat the corresponding (state, action) pairs in future episodes.